AITopics | disease prediction

Collaborating Authors

disease prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments

Fong, Edwin, James, Lancelot F., Lee, Juho

arXiv.org Machine LearningDec-25-2025

Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically reported zero cases. To this end, we present the detailed computational framework and experimental application of the Poisson Hierarchical Indian Buffet Process (PHIBP), with demonstrated success in handling sparse count data in microbiome and ecological studies. The PHIBP's architecture, grounded in the concept of absolute abundance, systematically borrows statistical strength from related regions and circumvents the known sensitivities of relative-rate methods to zero counts. Through a series of experiments on infectious disease data, we show that this principled approach provides a robust foundation for generating coherent predictive distributions and for the effective use of comparative measures such as alpha and beta diversity. The chapter's emphasis on algorithmic implementation and experimental results confirms that this unified framework delivers both accurate outbreak predictions and meaningful epidemiological insights in data-sparse settings.

diversity, prediction, reference county, (15 more...)

arXiv.org Machine Learning

2512.21005

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stanford Sleep Bench: Evaluating Polysomnography Pre-training Methods for Sleep Foundation Models

Kjaer, Magnus Ruud, Thapa, Rahul, Ganjoo, Gauri, Moore, Hyatt IV, Jennum, Poul Joergen, Westover, Brandon M., Zou, James, Mignot, Emmanuel, He, Bryan, Brink-Kjaer, Andreas

arXiv.org Artificial IntelligenceDec-11-2025

Polysomnography (PSG), the gold standard test for sleep analysis, generates vast amounts of multimodal clinical data, presenting an opportunity to leverage self-supervised representation learning (SSRL) for pre-training foundation models to enhance sleep analysis. However, progress in sleep foundation models is hindered by two key limitations: (1) the lack of a shared dataset and benchmark with diverse tasks for training and evaluation, and (2) the absence of a systematic evaluation of SSRL approaches across sleep-related tasks. To address these gaps, we introduce Stanford Sleep Bench, a large-scale PSG dataset comprising 17,467 recordings totaling over 163,000 hours from a major sleep clinic, including 13 clinical disease prediction tasks alongside canonical sleep-related tasks such as sleep staging, apnea diagnosis, and age estimation. We systematically evaluate SSRL pre-training methods on Stanford Sleep Bench, assessing downstream performance across four tasks: sleep staging, apnea diagnosis, age estimation, and disease and mortality prediction. Our results show that multiple pretraining methods achieve comparable performance for sleep staging, apnea diagnosis, and age estimation. However, for mortality and disease prediction, contrastive learning significantly outperforms other approaches while also converging faster during pretraining. To facilitate reproducibility and advance sleep research, we will release Stanford Sleep Bench along with pretrained model weights, training pipelines, and evaluation code.

artificial intelligence, machine learning, stanford, (17 more...)

arXiv.org Artificial Intelligence

2512.09591

Country:

North America > United States (1.00)
Europe > Denmark > Capital Region (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.87)

Industry:

Health & Medicine > Therapeutic Area > Sleep (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes

Zhao, Yibowen, Zhang, Yinan, Su, Zhixiang, Cui, Lizhen, Miao, Chunyan

arXiv.org Artificial IntelligenceDec-10-2025

Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patient awareness, facilitate early healthcare engagement, and improve healthcare system efficiency. However, existing approaches encounter critical challenges, including imbalanced disease distributions and a lack of interpretability, resulting in biased or unreliable predictions. To address these issues, we propose the Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework. KPI systematically integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy, which is particularly important for long-tailed diseases. Additionally, KPI utilizes large language models (LLMs) to generate patient-specific, medically relevant explanations, thereby improving interpretability and reliability. Extensive experiments on real-world datasets demonstrate that KPI outperforms state-of-the-art methods in predictive accuracy and provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.08261

Country: Asia (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Consumer Health (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction

Dao, Cong-Tinh, Phan, Nguyen Minh Thao, Ding, Jun-En, Wu, Chenwei, Restrepo, David, Luo, Dongsheng, Zhao, Fanyi, Liao, Chun-Chieh, Peng, Wen-Chih, Wang, Chi-Te, Chen, Pei-Fu, Chen, Ling, Ju, Xinglong, Liu, Feng, Hung, Fang-Ming

arXiv.org Artificial IntelligenceDec-1-2025

Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient's health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In this paper, we present CURENet, a multimodal model (Combining Unified Representations for Efficient chronic disease prediction) that integrates unstructured clinical notes, lab tests, and patients' time-series data by utilizing large language models (LLMs) for clinical text processing and textual lab tests, as well as transformer encoders for longitudinal sequential visits. CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses. We evaluated CURENet using the public MIMIC-III and private FEMH datasets, where it achieved over 94\% accuracy in predicting the top 10 chronic conditions in a multi-label framework. Our findings highlight the potential of multimodal EHR integration to enhance clinical decision-making and improve patient outcomes.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s13755-025-00396-w

2511.11423

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Grounding AI Explanations in Experience: A Reflective Cognitive Architecture for Clinical Decision Support

Shao, Zijian, Shen, Haiyang, Liu, Mugeng, Fu, Gecheng, Guo, Yaoqi, Wang, Yanfeng, Ma, Yun

arXiv.org Artificial IntelligenceSep-26-2025

Effective disease prediction in modern healthcare demands the twin goals of high accuracy and transparent, clinically meaningful explanations. Existing machine learning and large language model (LLM) based approaches often struggle to balance these goals. Many models yield accurate but unclear statistical outputs, while others generate fluent but statistically unsupported narratives, often undermining both the validity of the explanation and the predictive accuracy itself. This shortcoming comes from a shallow interaction with the data, preventing the development of a deep, detailed understanding similar to a human expert's. We argue that high accuracy and high-quality explanations are not separate objectives but are mutually reinforcing outcomes of a model that develops a deep, direct understanding of the data. To achieve this, we propose the Reflective Cognitive Architecture (RCA), a novel framework that coordinates multiple LLMs to learn from direct experience. RCA features an iterative rule refinement mechanism that improves its logic from prediction errors and a distribution-aware rules check mechanism that bases its reasoning in the dataset's global statistics. By using predictive accuracy as a signal to drive deeper comprehension, RCA builds a strong internal model of the data. We evaluated RCA on one private and two public datasets against 22 baselines. The results demonstrate that RCA not only achieves state-of-the-art accuracy and robustness with a relative improvement of up to 40\% over the baseline but, more importantly, leverages this deep understanding to excel in generating explanations that are clear, logical, evidence-based, and balanced, highlighting its potential for creating genuinely trustworthy clinical decision support systems. The code is available at \https://github.com/ssssszj/RCA.

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.21266

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Structured Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy

Shafi, Abdullah Al, Zannat, Rowzatul, Muntakim, Abdul, Hasan, Mahmudul

arXiv.org Artificial IntelligenceSep-9-2025

Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value, indicating whether a symptom is associated with a disease. Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.1361

Country: Europe > Portugal (0.14)

Genre: Research Report > New Finding (0.35)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

i-Mask: An Intelligent Mask for Breath-Driven Activity Recognition

Sinha, Ashutosh Kumar, Patel, Ayush, Dudhat, Mitul, Anand, Pritam, Mishra, Rahul

arXiv.org Artificial IntelligenceSep-8-2025

Human activity recognition (HAR) has gained significant attention due to its applications in health monitoring, intelligent environments, and human-computer interaction Hussain, Khan, Khan, Bhatt, Farouk, Bhola, and Baik (2024); Mishra, Gupta, Gupta, and Dutta (2022). Traditional HAR approaches employed wearable inertial sensors, vision-based methods, and environmental sensors for HAR. However, each method has inherent limitations such as discomfort, privacy concerns, or complex deployment requirements Wang, Huang, Zhao, Zhu, Huang, and Wu (2024); Mishra and Gupta (2025). The human body engages with its environment in diverse ways, one of which is the interaction between the lungs and the external environment through the act of breathing via the nose. The breathing pattern encompasses plenty of useful information that can be processed to fetch different behaviours and health information Mongelli, Orani, Cambiaso, Vaccari, Paglialonga, Braido, and Catalano (2020); Zhang, Wang, and Li (2024). Moreover, the breathing patterns are influenced by metabolic and physiological factors, offering a non-invasive and unobtrusive means of HAR.

data mining, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2509.04544

Country: Asia > India (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (0.87)
Information Technology > Smart Houses & Appliances (0.68)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Architecture > Real Time Systems (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(4 more...)

Add feedback

BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool

Ramos, Vicente, Hussein, Sundous, Abdel-Hafiz, Mohamed, Sarkar, Arunangshu, Liu, Weixuan, Kechris, Katerina J., Bowler, Russell P., Lange, Leslie, Banaei-Kashani, Farnoush

arXiv.org Artificial IntelligenceJul-29-2025

Multi-omics data offer unprecedented insights into complex biological systems, yet their high dimensionality, sparsity, and intricate interactions pose significant analytical challenges. Network-based approaches have advanced multi-omics research by effectively capturing biologically relevant relationships among molecular entities. While these methods are powerful for representing molecular interactions, there remains a need for tools specifically designed to effectively utilize these network representations across diverse downstream analyses. To fulfill this need, we introduce BioNeuralNet, a flexible and modular Python framework tailored for end-to-end network-based multi-omics data analysis. BioNeuralNet leverages Graph Neural Networks (GNNs) to learn biologically meaningful low-dimensional representations from multi-omics networks, converting these complex molecular networks into versatile embeddings. BioNeuralNet supports all major stages of multi-omics network analysis, including several network construction techniques, generation of low-dimensional representations, and a broad range of downstream analytical tasks. Its extensive utilities, including diverse GNN architectures, and compatibility with established Python packages (e.g., scikit-learn, PyTorch, NetworkX), enhance usability and facilitate quick adoption. BioNeuralNet is an open-source, user-friendly, and extensively documented framework designed to support flexible and reproducible multi-omics network analysis in precision medicine.

artificial intelligence, bioneuralnet, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.2044

Country: North America > United States > Colorado (0.17)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.46)
Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Trustworthy Chronic Disease Risk Prediction For Self-Directed Preventive Care via Medical Literature Validation

Le, Minh, Ton, Khoi

arXiv.org Artificial IntelligenceJun-24-2025

Chronic diseases are long-term, manageable, yet typically incurable conditions, highlighting the need for effective preventive strategies. Machine learning has been widely used to assess individual risk for chronic diseases. However, many models rely on medical test data (e.g. blood results, glucose levels), which limits their utility for proactive self-assessment. Additionally, to gain public trust, machine learning models should be explainable and transparent. Although some research on self-assessment machine learning models includes explainability, their explanations are not validated against established medical literature, reducing confidence in their reliability. To address these issues, we develop deep learning models that predict the risk of developing 13 chronic diseases using only personal and lifestyle factors, enabling accessible, self-directed preventive care. Importantly, we use SHAP-based explainability to identify the most influential model features and validate them against established medical literature. Our results show a strong alignment between the models' most influential features and established medical literature, reinforcing the models' trustworthiness. Critically, we find that this observation holds across 13 distinct diseases, indicating that this machine learning approach can be broadly trusted for chronic disease prediction. This work lays the foundation for developing trustworthy machine learning tools for self-directed preventive care. Future research can explore other approaches for models' trustworthiness and discuss how the models can be used ethically and responsibly.

artificial intelligence, chronic disease, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.1762

Country: North America > United States > Florida (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications

Wu, Da, Wang, Zhanliang, Nguyen, Quan, Xu, Zhuoran, Wang, Kai

arXiv.org Artificial IntelligenceMay-12-2025

The scarcity of high-quality multimodal biomedical data limits the ability to effectively fine-tune pretrained Large Language Models (LLMs) for specialized biomedical tasks. To address this challenge, we introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal large decoder models with domain-specific decision patterns from multimodal biomedical data through preference optimization. While MINT supports different optimization techniques, we primarily implement it with the Odds Ratio Preference Optimization (ORPO) framework as its backbone. This strategy enables the aligned LLMs to perform predictive tasks using text-only or image-only inputs while retaining knowledge learnt from multimodal data. MINT leverages an upstream multimodal machine learning (MML) model trained on high-quality multimodal data to transfer domain-specific insights to downstream text-only or image-only LLMs. We demonstrate its effectiveness through two key applications: (1) Rare genetic disease prediction from texts, where MINT uses a multimodal encoder model, trained on facial photos and clinical notes, to generate a preference dataset for aligning a lightweight Llama 3.2-3B-Instruct. Despite relying on text input only, the MINT-derived model outperforms models trained with SFT, RAG, or DPO, and even outperforms Llama 3.1-405B-Instruct. (2) Tissue type classification using cell nucleus images, where MINT uses a vision-language foundation model as the preference generator, containing knowledge learnt from both text and histopathological images to align downstream image-only models. The resulting MINT-derived model significantly improves the performance of Llama 3.2-Vision-11B-Instruct on tissue type classification. In summary, MINT provides an effective strategy to align unimodal LLMs with high-quality multimodal expertise through preference optimization.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.05736

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Spain > La Rioja > Logroño (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.88)
Health & Medicine > Therapeutic Area > Gastroenterology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback